Sanskrit Sandhi Splitting using $\pmb{seq2(seq)^2}$
نویسندگان
چکیده
In Sanskrit, small words (morphemes) are combined through a morphophonological process called Sandhi to form compound words. Sandhi splitting is the process of splitting a given compound word into its constituent morphemes. Although rules governing the splitting of words exist, it is highly challenging to identify the location of the splits in a compound word, as the same compound word might be broken down in multiple ways to provide syntactically correct splits. Existing systems explore incorporating these pre-defined splitting rules, but have low accuracy since they don’t address the fundamental problem of identifying the split location. With this work, we propose a novel Double Decoder RNN (DD-RNN) architecture which i) predicts the location of the split(s) with an accuracy of 95% and ii) predicts the constituent words (i.e. learning the Sandhi splitting rules) with an accuracy of 79.5%. To the best of our knowledge, deep learning techniques have never been applied to the Sandhi splitting problem before. We further demonstrate that our model out-performs the previous state-of-the-art significantly.
منابع مشابه
A Sandhi Splitter for Malayalam
Sandhi splitting is the primary task for computational processing of text in Sanskrit and Dravidian languages. In these languages, words can join together with morpho-phonemic changes at the point of joining. This phenomenon is known as Sandhi. Sandhi splitter splits the string of conjoined words into individual words. Accurate execution of sandhi splitting is crucial for text processing tasks ...
متن کاملComparative Analysis of Single-Cell RNA Sequencing Methods.
Single-cell RNA sequencing (scRNA-seq) offers new possibilities to address biological and medical questions. However, systematic comparisons of the performance of diverse scRNA-seq protocols are lacking. We generated data from 583 mouse embryonic stem cells to evaluate six prominent scRNA-seq methods: CEL-seq2, Drop-seq, MARS-seq, SCRB-seq, Smart-seq, and Smart-seq2. While Smart-seq2 detected t...
متن کاملComputational Algorithms Based on the Paninian System to Process Euphonic Conjunctions for Word Searches
Searching for words in Sanskrit E-text is a problem that is accompanied by complexities introduced by features of Sanskrit such as euphonic conjunctions or ‘sandhis’. A word could occur in an E-text in a transformed form owing to the operation of rules of sandhi. Simple word search would not yield these transformed forms of the word. Further, there is no search engine in the literature that can...
متن کاملAn Ontology for Comprehensive Tutoring of Euphonic Conjunctions of Sanskrit Grammar
Euphonic conjunctions (sandhis) form a very important aspect of Sanskrit morphology and phonology. The traditional and modern methods of studying about euphonic conjunctions in Sanskrit follow different methodologies. The former involves a rigorous study of the Pāṇinian system embodied in Pāṇini’s Āṣtādhyāyī, while the latter usually involves the study of a few important sandhi rules with the u...
متن کاملDesign & Analysis of an Exhaustive Algorithm for Sandhi Processing In Sanskrit
––It is almost impossible to learn a new language without the study of it’s grammar .Automated language processing is in real centrally focused to drive to enable facilitated referencing of increasingly available Sanskrit E-texts. For learning Sanskrit language , the study of it’s grammar plays a very important role .Proposed research paper presents a fresh and new approach to processing Sandhi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2018